16 research outputs found

    Ranking Medical Terms to Support Expansion of Lay Language Resources for Patient Comprehension of Electronic Health Record Notes: Adapted Distant Supervision Approach

    Get PDF
    BACKGROUND: Medical terms are a major obstacle for patients to comprehend their electronic health record (EHR) notes. Clinical natural language processing (NLP) systems that link EHR terms to lay terms or definitions allow patients to easily access helpful information when reading through their EHR notes, and have shown to improve patient EHR comprehension. However, high-quality lay language resources for EHR terms are very limited in the public domain. Because expanding and curating such a resource is a costly process, it is beneficial and even necessary to identify terms important for patient EHR comprehension first. OBJECTIVE: We aimed to develop an NLP system, called adapted distant supervision (ADS), to rank candidate terms mined from EHR corpora. We will give EHR terms ranked as high by ADS a higher priority for lay language annotation-that is, creating lay definitions for these terms. METHODS: Adapted distant supervision uses distant supervision from consumer health vocabulary and transfer learning to adapt itself to solve the problem of ranking EHR terms in the target domain. We investigated 2 state-of-the-art transfer learning algorithms (ie, feature space augmentation and supervised distant supervision) and designed 5 types of learning features, including distributed word representations learned from large EHR data for ADS. For evaluating ADS, we asked domain experts to annotate 6038 candidate terms as important or nonimportant for EHR comprehension. We then randomly divided these data into the target-domain training data (1000 examples) and the evaluation data (5038 examples). We compared ADS with 2 strong baselines, including standard supervised learning, on the evaluation data. RESULTS: The ADS system using feature space augmentation achieved the best average precision, 0.850, on the evaluation set when using 1000 target-domain training examples. The ADS system using supervised distant supervision achieved the best average precision, 0.819, on the evaluation set when using only 100 target-domain training examples. The 2 ADS systems both performed significantly better than the baseline systems (P \u3c .001 for all measures and all conditions). Using a rich set of learning features contributed to ADS\u27s performance substantially. CONCLUSIONS: ADS can effectively rank terms mined from EHRs. Transfer learning improved ADS\u27s performance even with a small number of target-domain training examples. EHR terms prioritized by ADS were used to expand a lay language resource that supports patient EHR comprehension. The top 10,000 EHR terms ranked by ADS are available upon request

    Automatic extraction of informal topics from online suicidal ideation

    Full text link
    Abstract Background Suicide is an alarming public health problem accounting for a considerable number of deaths each year worldwide. Many more individuals contemplate suicide. Understanding the attributes, characteristics, and exposures correlated with suicide remains an urgent and significant problem. As social networking sites have become more common, users have adopted these sites to talk about intensely personal topics, among them their thoughts about suicide. Such data has previously been evaluated by analyzing the language features of social media posts and using factors derived by domain experts to identify at-risk users. Results In this work, we automatically extract informal latent recurring topics of suicidal ideation found in social media posts. Our evaluation demonstrates that we are able to automatically reproduce many of the expertly determined risk factors for suicide. Moreover, we identify many informal latent topics related to suicide ideation such as concerns over health, work, self-image, and financial issues. Conclusions These informal topics topics can be more specific or more general. Some of our topics express meaningful ideas not contained in the risk factors and some risk factors do not have complimentary latent topics. In short, our analysis of the latent topics extracted from social media containing suicidal ideations suggests that users of these systems express ideas that are complementary to the topics defined by experts but differ in their scope, focus, and precision of language.https://deepblue.lib.umich.edu/bitstream/2027.42/144214/1/12859_2018_Article_2197.pd

    Unlocking echocardiogram measurements for heart disease research through natural language processing

    No full text
    Abstract Background In order to investigate the mechanisms of cardiovascular disease in HIV infected and uninfected patients, an analysis of echocardiogram reports is required for a large longitudinal multi-center study. Implementation A natural language processing system using a dictionary lookup, rules, and patterns was developed to extract heart function measurements that are typically recorded in echocardiogram reports as measurement-value pairs. Curated semantic bootstrapping was used to create a custom dictionary that extends existing terminologies based on terms that actually appear in the medical record. A novel disambiguation method based on semantic constraints was created to identify and discard erroneous alternative definitions of the measurement terms. The system was built utilizing a scalable framework, making it available for processing large datasets. Results The system was developed for and validated on notes from three sources: general clinic notes, echocardiogram reports, and radiology reports. The system achieved F-scores of 0.872, 0.844, and 0.877 with precision of 0.936, 0.982, and 0.969 for each dataset respectively averaged across all extracted values. Left ventricular ejection fraction (LVEF) is the most frequently extracted measurement. The precision of extraction of the LVEF measure ranged from 0.968 to 1.0 across different document types. Conclusions This system illustrates the feasibility and effectiveness of a large-scale information extraction on clinical data. New clinical questions can be addressed in the domain of heart failure using retrospective clinical data analysis because key heart function measurements can be successfully extracted using natural language processing

    Understanding headache classification coding within the veterans health administration using ICD-9-CM and ICD-10-CM in fiscal years 2014-2017.

    No full text
    ObjectivesUnderstand the continuity and changes in headache not-otherwise-specified (NOS), migraine, and post-traumatic headache (PTH) diagnoses after the transition from ICD-9-CM to ICD-10-CM in the Veterans Health Administration (VHA).BackgroundHeadache is one of the most commonly diagnosed chronic conditions managed within primary and specialty care clinics. The VHA transitioned from ICD-9-CM to ICD-10-CM on October-1-2015. The effect transitioning on coding of specific headache diagnoses is unknown. Accuracy of headache diagnosis is important since different headache types respond to different treatments.MethodsWe mapped headache diagnoses from ICD-9-CM (FY 2014/2015) onto ICD-10-CM (FY 2016/2017) and computed coding proportions two years before/after the transition in VHA. We used queries to determine the change in transition pathways. We report the odds of ICD-10-CM coding associated with ICD-9-CM controlling for provider type, and patient age, sex, and race/ethnicity.ResultsOnly 37%, 58% and 34% of patients with ICD-9-CM coding of NOS, migraine, and PTH respectively had an ICD-10-CM headache diagnosis. Of those with an ICD-10-CM diagnosis, 73-79% had a single headache diagnosis. The odds ratios for receiving the same code in both ICD-9-CM and ICD-10-CM after adjustment for ICD-9-CM and ICD-10-CM headache comorbidities and sociodemographic factors were high (range 6-26) and statistically significant. Specifically, 75% of patients with headache NOS had received one headache diagnoses (Adjusted headache NOS-ICD-9-CM OR for headache NOS-ICD-10-CM = 6.1, 95% CI 5.89-6.32. 79% of migraineurs had one headache diagnoses, mostly migraine (Adjusted migraine-ICD-9-CM OR for migraine-ICD-10-CM = 26.43, 95% CI 25.51-27.38). The same held true for PTH (Adjusted PTH-ICD-9-CM OR for PTH-ICD-10-CM = 22.92, 95% CI: 18.97-27.68). These strong associations remained after adjustment for specialist care in ICD-10-CM follow-up period.DiscussionThe majority of people with ICD-9-CM headache diagnoses did not have an ICD-10-CM headache diagnosis. However, a given diagnosis in ICD-9-CM by a primary care provider (PCP) was significantly predictive of its assignment in ICD-10-CM as was seeing either a neurologist or physiatrist (compared to a generalist) for an ICD-10-CM headache diagnosis.ConclusionWhen a veteran had a specific diagnosis in ICD-9-CM, the odds of being coded with the same diagnosis in ICD-10-CM were significantly higher. Specialist visit during the ICD-10-CM period was independently associated with all three ICD-10-CM headaches

    Panel-A: Headache NOS coding.

    No full text
    Panel-B: Migraine headache coding. Panel-C: PTH headache coding in FY2016/2017. Abbreviations: Not-otherwise-specified (NOS); Trigeminal autonomic cephalalgia (TAC); Post-traumatic headache (PTH).</p
    corecore